Week 09: Register Allocation & Garbage Collection

Register Allocation

Register Allocation

Introduction

  • Intermediate code uses unlimited temporaries
  • Typical intermediate code uses too many temporaries
  • Problem
    • Rewrite the intermediate code to use no more temporaries than there are machine registers.
  • Solution
    • Assign multiple temporaries to each register
    • But without changing the program behavior
  • Example
    • Source: a = c + d; e = a + b; f = e - 1;
    • Target: r1 = r2 + r3; r1 = r1 + r4; r1 = r1 - 1;
  • Theorem
    • Temporaries t1 and t2 can share the same register if at any point in the program at most one of t1 or t2 is live.
    • If t1 and t2 are live at the same time, they cannot share a register.
  • Compute live variables for each point
  • Algo
    • Construct an undirected graph.
      • A node for each temporary.
      • An edge between t1 and t2 if they are live simultaneously at some point in the program.
    • This is the register interference graph(RIG)
      • Two temporaries can be allocated to the same register if there is no edge connecting them
  • Sample
    • b and c cannot be in the same register
    • b and d could be in the same register

Graph Coloring

  • Coloring
    • A coloring of a graph is an assignment of colors to nodes, such that nodes connected by an edge have different colors.
    • A graph is k-colorable if it has a coloring with k colors.
    • NP-Hard
  • RIG is a coloring problem.
  • Heuristic
    1. Decide coloring orders
      • Pick a node t with fewer than k neighbors
      • Put t on a stack and remove it from the RIG
      • Repeat until the graph is empty
    2. Assign the color
      • Start with the last node added
      • At each step pick a color different from those assigned to already colored neighbors

Spilling

Introduction

  • If the graph coloring heuristic fails to find a coloring, then we can’t hold all values in registers.
    • Some values are spilled to memory.
  • What if all nodes have k or more neighbors?
    • Pick a node as a candidate for spilling
    • A spilled value lives in memory
    • Assume f is chosen. Remove f and continue the simplification.
  • If optimistic coloring fails, we spill f.
    • Allocate a memory location for f like $fa.
    • Before each operation that reads f, insert f = load fa.
    • After each operation that writes f, insert store f, fa.
    • Note f has been split into three temporaries.
  • fi is live only
    • Between a fi = load fa and the next instruction
    • Between a store fi, fa and the preceding instr.
  • Spilling reduces the live range of f
  • Additional spills might be required before a coloring is found
  • The tricky part is deciding what to spill
  • Possible heuristics
    • Spill temporaries with most conflicts
    • Spill temporaries with few definitions and uses
    • Avoid spilling in inner loops

Remarks

  • Register allocation is a must have in compilers
    • Because intermediate code uses too many temporaries
    • Because it makes a big difference in performance
  • Register allocation is more complicated for CISC machines.

Managing Caches

Introduction

  • Compilers are very good at managing registers.
  • Compilers are not good at managing caches.
    • This problem is still left to programmers.
  • Example
    • Before: for(j = 0 -> 9) for(i = 1 -> 10000) a[i] *= b[i];
    • After: for(i = 1 -> 10000) for(j = 0 -> 9) a[i] *= b[i];
    • Might be more than 10x faster.
  • A compiler can perform this loop interchange optimization.

Garbage Collection

Automatic Memory Management

Introduction

  • C and C++ programs have many storage bugs
    • forgetting to free unused memory
    • dereferencing a dangling pointer
    • overwriting parts of a data structure by accident
  • When an object is created, unused space is automatically allocated
  • Some space is occupied by objects that will never be used again
    • This space can be freed to be reused later
  • An object x is reachable if and only if:
    • a register contains a pointer to x, or
    • another reachable object y contains a pointer to x
  • An unreachable object can never be used

Managing Memory in Cool

  • Uses an accumulator
    • it points to an object
    • and this object may point to other objects, etc.
  • And a stack pointer
    • each stack frame contains pointers, e.g., method parameters
    • each stack frame also contains non-pointers, e.g., return address
  • Start tracing from acc and stack (roots)
  • Every garbage collection scheme has the following steps
    1. Allocate space as needed for new objects
    2. When space runs out:
      1. Compute what objects might be used again (generally by tracing objects reachable from a set of root registers)
      2. Free the space used by objects not found in (1)
  • Some strategies perform garbage collection before the space actually runs out.

Mark and Sweep

Introduction

  • Algo
    • The Mark Phase: trace reachable objects
    • The Sweep Phase: collect garbage objects
  • Every object has an extra bit: the mark bit
    • Initial: 0
    • Set to 1 for the reachable objects in the mark phase
  • Any garbage object is added to the free list
  • Objects with a mark bit 1 reset to 0
  • A serious problem with the mark phase
    • it is invoked when we are out of space
    • yet it needs space to construct the queue in the mark phase
    • the size of the todo list is unbounded, so we cannot reserve space for it a priori
  • The todo list is used as an auxiliary data structure to perform the reachability analysis
  • There is a trick that allows the auxiliary data to be stored in the objects themselves
  • Space for a new object is allocated from the new list
    • a block large enough is picked
    • an area of the necessary size is allocated from it
    • the left-over is put back in the free list
  • Mark and sweep can fragment the memory

Stop and Copy

Introduction

  • Memory is organized into two areas
    • old space: used for allocation
    • new space: used as a reserve for GC
  • GC starts when the old space is full
  • Copies all reachable objects from old space into new space
  • After the copy the roles of the old and new spaces are reversed and the program resumes
    • Use new space as old space; use old space as new space.
  • Find all the reachable objects: mark and sweep
    • Fix pointers
  • Algo
    1. Copy the objects pointed to by roots and set forwarding pointers
    2. Follow the pointer in the next unscanned object (A)
      • copy the pointed-to objects
      • fix the pointer in A
      • set forwarding pointer
    3. Follow the pointer in the next unscanned object
  • Must be able to tell how large an object is when we scan it

Conservative Collection

Introduction

  • GC relies on being able to find all reachable objects
  • In C or C++ it is impossible to identify the contents of objects in memory
  • But it is Ok to be conservative
    • if a memory word looks like a pointer, it is considered a pointer
      • it must be aligned
      • it must point to a valid address in the data segment
    • all such pointers are followed and we overestimate the set of reachable objects
  • But we still cannot move objects because we cannot update pointers to them

Reference Counting

Introduction

  • Try to collect an object when there are no more pointers to it
  • Store in each object the number of pointers to that object
    • this is the reference count
  • Advantages
    • easy to implement
    • collects garbage incrementally without large pauses in the execution
  • Disadvantages
    • cannot collect circular structures
    • manipulating reference counts at each assignment is very slow
  • Automatic memory management prevents serious storage bugs
  • But reduces programmer control
    • layout of data in memory
    • when is memory deallocated
  • Pauses problematic in real-time applications
  • Memory leaks possible (even likely)
  • Garbage collection is very important
  • There are more advanced garbage collection algorithms
    • concurrent: allow the program to run while the collection is happening
    • generational: do not scan long-lived objects at every collection
    • real time: bound the length of pauses
    • parallel: several collectors working at once

In [ ]: